{
 "metadata": {
  "name": "",
  "signature": "sha256:1dd0105a458c9c19900ecc0481ad11b3f00f6d65f367f94d807584641f7e17c8"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Comparing machine learning models in scikit-learn\n",
      "*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Agenda\n",
      "\n",
      "- How do I choose **which model to use** for my supervised learning task?\n",
      "- How do I choose the **best tuning parameters** for that model?\n",
      "- How do I estimate the **likely performance of my model** on out-of-sample data?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Review\n",
      "\n",
      "- Classification task: Predicting the species of an unknown iris\n",
      "- Used three classification models: KNN (K=1), KNN (K=5), logistic regression\n",
      "- Need a way to choose between the models\n",
      "\n",
      "**Solution:** Model evaluation procedures"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Evaluation procedure #1: Train and test on the entire dataset"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "1. Train the model on the **entire dataset**.\n",
      "2. Test the model on the **same dataset**, and evaluate how well we did by comparing the **predicted** response values with the **true** response values."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# read in the iris data\n",
      "from sklearn.datasets import load_iris\n",
      "iris = load_iris()\n",
      "\n",
      "# create X (features) and y (response)\n",
      "X = iris.data\n",
      "y = iris.target"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Logistic regression"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# import the class\n",
      "from sklearn.linear_model import LogisticRegression\n",
      "\n",
      "# instantiate the model (using the default parameters)\n",
      "logreg = LogisticRegression()\n",
      "\n",
      "# fit the model with data\n",
      "logreg.fit(X, y)\n",
      "\n",
      "# predict the response values for the observations in X\n",
      "logreg.predict(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
        "       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
        "       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1,\n",
        "       1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1,\n",
        "       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n",
        "       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,\n",
        "       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# store the predicted response values\n",
      "y_pred = logreg.predict(X)\n",
      "\n",
      "# check how many predictions were generated\n",
      "len(y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "150"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Classification accuracy:\n",
      "\n",
      "- **Proportion** of correct predictions\n",
      "- Common **evaluation metric** for classification problems"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# compute classification accuracy for the logistic regression model\n",
      "from sklearn import metrics\n",
      "print metrics.accuracy_score(y, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.96\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Known as **training accuracy** when you train and test the model on the same data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### KNN (K=5)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.neighbors import KNeighborsClassifier\n",
      "knn = KNeighborsClassifier(n_neighbors=5)\n",
      "knn.fit(X, y)\n",
      "y_pred = knn.predict(X)\n",
      "print metrics.accuracy_score(y, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.966666666667\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### KNN (K=1)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "knn = KNeighborsClassifier(n_neighbors=1)\n",
      "knn.fit(X, y)\n",
      "y_pred = knn.predict(X)\n",
      "print metrics.accuracy_score(y, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1.0\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Problems with training and testing on the same data\n",
      "\n",
      "- Goal is to estimate likely performance of a model on **out-of-sample data**\n",
      "- But, maximizing training accuracy rewards **overly complex models** that won't necessarily generalize\n",
      "- Unnecessarily complex models **overfit** the training data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Overfitting](images/05_overfitting.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "*Image Credit: [Overfitting](http://commons.wikimedia.org/wiki/File:Overfitting.svg#/media/File:Overfitting.svg) by Chabacano. Licensed under GFDL via Wikimedia Commons.*"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Evaluation procedure #2: Train/test split"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "1. Split the dataset into two pieces: a **training set** and a **testing set**.\n",
      "2. Train the model on the **training set**.\n",
      "3. Test the model on the **testing set**, and evaluate how well we did."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# print the shapes of X and y\n",
      "print X.shape\n",
      "print y.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(150L, 4L)\n",
        "(150L,)\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 1: split X and y into training and testing sets\n",
      "from sklearn.cross_validation import train_test_split\n",
      "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Train/test split](images/05_train_test_split.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "What did this accomplish?\n",
      "\n",
      "- Model can be trained and tested on **different data**\n",
      "- Response values are known for the testing set, and thus **predictions can be evaluated**\n",
      "- **Testing accuracy** is a better estimate than training accuracy of out-of-sample performance"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# print the shapes of the new X objects\n",
      "print X_train.shape\n",
      "print X_test.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(90L, 4L)\n",
        "(60L, 4L)\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# print the shapes of the new y objects\n",
      "print y_train.shape\n",
      "print y_test.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(90L,)\n",
        "(60L,)\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 2: train the model on the training set\n",
      "logreg = LogisticRegression()\n",
      "logreg.fit(X_train, y_train)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
        "          intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# STEP 3: make predictions on the testing set\n",
      "y_pred = logreg.predict(X_test)\n",
      "\n",
      "# compare actual response values (y_test) with predicted response values (y_pred)\n",
      "print metrics.accuracy_score(y_test, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.95\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Repeat for KNN with K=5:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "knn = KNeighborsClassifier(n_neighbors=5)\n",
      "knn.fit(X_train, y_train)\n",
      "y_pred = knn.predict(X_test)\n",
      "print metrics.accuracy_score(y_test, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.966666666667\n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Repeat for KNN with K=1:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "knn = KNeighborsClassifier(n_neighbors=1)\n",
      "knn.fit(X_train, y_train)\n",
      "y_pred = knn.predict(X_test)\n",
      "print metrics.accuracy_score(y_test, y_pred)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.95\n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Can we locate an even better value for K?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# try K=1 through K=25 and record testing accuracy\n",
      "k_range = range(1, 26)\n",
      "scores = []\n",
      "for k in k_range:\n",
      "    knn = KNeighborsClassifier(n_neighbors=k)\n",
      "    knn.fit(X_train, y_train)\n",
      "    y_pred = knn.predict(X_test)\n",
      "    scores.append(metrics.accuracy_score(y_test, y_pred))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# import Matplotlib (scientific plotting library)\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "# allow plots to appear within the notebook\n",
      "%matplotlib inline\n",
      "\n",
      "# plot the relationship between K and testing accuracy\n",
      "plt.plot(k_range, scores)\n",
      "plt.xlabel('Value of K for KNN')\n",
      "plt.ylabel('Testing Accuracy')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "<matplotlib.text.Text at 0x1722eb00>"
       ]
      },
      {
       "metadata": {},
       "output_type": "display_data",
       "png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEPCAYAAACDTflkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xu0nHV97/H3JwkJSYCEkJiE3CZawEApN40cxeOuoidW\nEcELcFyFKtJ4VkGtnlWQdZZs9LQFKmgsFrGiYMvNitjoMXLRbFotIpAYIoQIuHdukBAgCQkEyOV7\n/nieSSaTfZnZe56ZZ2Y+r7X2yjPP9TfDw3zn9/v+fr9HEYGZmdlgDGt0AczMrHk5iJiZ2aA5iJiZ\n2aA5iJiZ2aA5iJiZ2aA5iJiZ2aBlGkQkzZP0uKQnJF3cy/ZDJd0paZmkByQdU7LtC5IelbRc0i2S\nRqXrOyWtlbQ0/ZuX5XswM7O+ZRZEJA0HrgXmAUcD50iaU7bbpcCSiDgOOBdYkB5bAC4AToyIY4Hh\nwNnpMQFcExEnpH8/y+o9mJlZ/7KsicwFnoyInojYAdwGnF62zxxgMUBErAQKkiYBLwI7gDGSRgBj\ngHUlxynDcpuZWYWyDCLTgDUlr9em60otA84EkDQXmAVMj4gXgKuB1cDTwOaIuLfkuIvSJrAbJI3P\n6g2YmVn/sgwilcyncgUwXtJS4EJgKbBL0huAzwIF4HDgIEkfS4+5DpgNHA88QxJszMysAUZkeO51\nwIyS1zNIaiN7RMRW4BPF15K6gT8A7wP+KyKeT9f/EHgrcHNEPFuy/7eBH/d2cUmeFMzMrEoRUVW6\nIMuayEPAEZIKkkYCZwELS3eQNC7dhqQLgPsiYhuwEjhZ0mhJAk4FHkv3m1pyijOA5X0VICL8F8Fl\nl13W8DLk4c+fgz8Lfxb9/w1GZjWRiNgp6ULgLpLeVTdExApJ89Pt15P02roxrTX8Djg/3fZbSd8j\nCUS7gSXAt9JTXynpeJLmsm5gflbvwczM+pdlcxYRsQhYVLbu+pLl+4Gj+jj2KuCqXtafW+NimpnZ\nIHnEehvo6OhodBFywZ/DXv4s9vJnMTQabDtY3kmKVn1vZmZZkETkKLFuZmYtzkHEzMwGzUHEzMwG\nLdPeWdac1q+H5X2OvjGrzuzZ8Ed/lP11Nm+GXbvgsMOyv5bt5cS67eeTn4Rf/xqmTh14X7P+bN0K\nw4fDr36V/bW++EXYtg2uuSb7a7WqwSTWXROx/fzhD/DVr8K7393oklizW7sW3vzm+lzrqafgpZfq\ncy3by0HE9tPTA4VCo0thrWDqVHjhBdi+HUaPzvZaPT0OIo3gxLrtY+dOWLcOZs5sdEmsFQwfntxL\nq1dnf62enuTP6stBxPbx9NMwaRKMGtXoklirKBSy/3J/9VV47rkksb55c7bXsn05iNg+urvdlGW1\nVSgk91WWVq2C6dOTnmBZX8v25SBi+3A+xGpt9uzsayI9Pcl16nEt25eDiO2j+D+jWa3Uozmr+OOn\nHteyfTmI2D5cE7FacxBpbQ4itg/nRKzW6pETKd639biW7cvjRGwfrolYrU2ZAi++CC+/DGPGZHON\nYjPs2LGuidSbayK2x86d8MwzMGNGo0tirWTYsGSsyKpV2V2jvDnLMx7Vj4OI7bF2LUyeDCNHNrok\n1mqyzFVs3w6bNiWj48ePT4LWpk3ZXMv25yBiezgfYlnJMlexalVSex42LPtr2f4cRGwP50MsK1mO\n3yjvlu6xIvXlIGJ7eIyIZSXL5qzyHz/u5ltfDiK2h2silpUsm5gcRBor0yAiaZ6kxyU9IeniXrYf\nKulOScskPSDpmJJtX5D0qKTlkm6RNCpdP0HSPZJ+L+luSeOzfA/txDkRy0qWX+zl961zIvWVWRCR\nNBy4FpgHHA2cI2lO2W6XAksi4jjgXGBBemwBuAA4MSKOBYYDZ6fHXALcExFHAj9PX1sNuDnLsjJ5\ncvKsj23ban9u50QaK8uayFzgyYjoiYgdwG3A6WX7zAEWA0TESqAgaRLwIrADGCNpBDAGWJce8wHg\npnT5JuCDGb6HtvHaa7BhQzITqlmtSTBrVjZjRcqbs2bN8liResoyiEwD1pS8XpuuK7UMOBNA0lxg\nFjA9Il4ArgZWA08DWyLi3vSYyRGxIV3eAEzOpvjtZe3apJ/9CM9hYBnJopnp5ZeT0fCTS74Fxo1L\nxjo9/3xtr2W9y/Iro5LfAVcACyQtBZYDS4Fdkt4AfBYoAFuAf5P0sYi4eZ8LRISkPq/T2dm5Z7mj\no4OOjo4q30L7cD7EspZFXqSnJxkNP6zs53AxYE2cWNvrtZquri66urqGdI4sg8g6oHQCjRkktZE9\nImIr8Inia0ndwB+A9wH/FRHPp+t/CLwVuBnYIGlKRKyXNBV4tq8ClAYR65/zIZa1LHIVfd23xWu9\n+c21vV6rKf9xffnll1d9jiybsx4CjpBUkDQSOAtYWLqDpHHpNiRdANwXEduAlcDJkkZLEnAq8Fh6\n2ELgvHT5POBHGb6HtuHuvZa1rGoivd237uZbP5kFkYjYCVwI3EUSAG6PiBWS5kuan+52NLBc0uPA\n/wA+kx77W+B7JIHokXTfb6X/XgG8W9LvgXemr22IHEQsa1nkRBxEGi/TNGpELAIWla27vmT5fuCo\nPo69Criql/UvkNRMrIacE7GsZdGc1d0NJ564//pCARYt2n+91Z5HrBvgnIhlb+JEeOWVpDdVrQyU\nE7HsOYgYr74KGzfC4Yc3uiTWyqSkhlDLsSJ9NWd5rEj9OIgYa9bAtGkeI2LZq2VeZNu2ZBT86163\n/7aDD06eorhxY22uZX1zEDHnQ6xuatnM1NOT1Dik3rd7Dq36cBAx50OsbmrZa2qg+9Z5kfpwEDF3\n77W6qXUQ6e++dTff+nAQMQcRq5taNjE5iOSDg4g5J2J1U8smpoHuW+dE6sNBxJwTsbqZMAF27oTN\nm4d+LudE8sFBpM298koyZfbUqY0uibWD4liRWny5D9ScVXx+iceKZMtBpM2tXg0zZsDw4Y0uibWL\nWgSRF19MfgD1N9X72LHJeJENG/rex4bOQaTNOR9i9VaLZqZiLaSvMSJFzotkz0GkzTkfYvVWi5pI\npfet8yLZcxBpc+7ea/VWi9pBpfetu/lmz0GkzTmIWL3VqibiIJIPDiJtzjkRq7diE9NQek11d1fe\nnOWcSLYcRNqccyJWb+PHJ/8OZayIayL54SDSxrZvT/5HnjKl0SWxdlIcKzKUGkKlQWTWrKQb++7d\ng7+W9c9BpI319MDMmTDMd4HV2VB6TW3enIx6nzBh4H1Hj05qPs88M7hr2cD89dHGnFS3RhlKM1Ox\nCXagMSJF7uabLQeRNuZ8iDXKUININT9+nBfJloNIG3NNxBplKDkRB5F8cRBpY+7ea40ylCamwQQR\nd/PNTqZBRNI8SY9LekLSxb1sP1TSnZKWSXpA0jHp+qMkLS352yLp0+m2TklrS7bNy/I9tDLXRKxR\nZs0a/FiRSseIFDknkq0RWZ1Y0nDgWuBUYB3woKSFEbGiZLdLgSURcYako4BvAKdGxErghPQ8w9Lj\n70yPCeCaiLgmq7K3C+dErFHGj4cRI+CFF+Cww6o71s1Z+ZJlTWQu8GRE9ETEDuA24PSyfeYAiwHS\nwFGQNKlsn1OBpyJiTcm6CvtlWF9eegm2boXJkxtdEmtXg2lmiqg+iMycCWvWwK5d1V3LKpNlEJkG\nlH7xr03XlVoGnAkgaS4wC5hets/ZwC1l6y5Km8BukDS+dkVuHz09SZNCpd0kzWptMM1MxVHu46v4\nv/7AA5PaztNPV3ctq0xmzVkkzU4DuQJYIGkpsBxYCuz5vSBpJHAaUJpPuQ74Urr8ZeBq4PzeTt7Z\n2blnuaOjg46OjooL3+qcD7FGG0wzUzEfUu2Pn2LAmjGjuuNaXVdXF11dXUM6R5ZBZB1Q+p9sBklt\nZI+I2Ap8ovhaUjfwh5Jd3gs8HBEbS455tmT/bwM/7qsApUHE9uV8iDVaoQArV1Z3zGB//BQD1tvf\nXv2xraz8x/Xll19e9TmybM56CDhCUiGtUZwFLCzdQdK4dBuSLgDui4htJbucA9xadkzp08DPIKnB\nWJVcE7FGG0xNZKhBxGovs5pIROyUdCFwFzAcuCEiVkian26/HjgauFFSAL+jpFlK0liSpPoFZae+\nUtLxJM1l3cD8rN5DK+vuhje9qdGlsHY2mJxITw+8/vXVX6tQgPvvr/44G1iWzVlExCJgUdm660uW\n7weO6uPYl4CJvaw/t8bFbEuuiVijlY4VqTTH0d0N73pX9deaPRtuvXXg/ax6HrHeppwTsUY75JCk\n59TGjQPvW+TmrPxxEGlDW7fCyy/DpPIROWZ1Vs2Xe3GMyKxZ1V9nxgxYty6ZQt5qy0GkDRV/zXmM\niDVaNXmR559PRrlXM0akaNSo5EfTunXVH2v9GzCISLqmOKeVtQbnQywvqqmJDLUJ1nNoZaOSmsgK\n4FuSfiPpU5LGZV0oy5bzIZYX1Ux9MtQfP86LZGPAIBIR/xwRbwPOBQrAckm3SPrTrAtn2XBNxPKi\nmtqBg0g+VZQTSWfkfSPJhIkbSea8+pyk2zMsm2XEzxGxvKjmi32o962fK5KNSnIiXwVWAn8G/G1E\nnBQRV0bEacDxWRfQas81EcuLWbNg1arKnivinEg+VTLY8BHg/6SD/8q9pcblsTpwTsTy4qCDYOxY\n2LABpkzpf183Z+VTJc1ZW4ADii8kjZf0QYCI2JxVwSwbW7bAa69V/yAgs6xUUkMYyhiRounT4Zln\nPFak1ioJIpeVBot0uTOzElmmPEbE8qaSGsLGjcno9kMOGfx1Ro5MHsK2Zs3A+1rlKgkivX3dDK91\nQaw+nA+xvKkkiNSqCdZ5kdqrJIg8nA44fIOkP0oT7Q9nXTDLhvMhljeV9Jqq1Y8f50Vqr5IgchGw\nA7id5DnprwB/lWWhLDuuiVjeVFI7cBDJrwF7Z6UPibp4oP2sOXR3wymnNLoUZntV8sXe3Q3H1GDy\npUIBFi8e+nlsr0rGibxO0lck/VTS4vTvF/UonNWeayKWN8WxIrt3972PcyL5VUlz1s3A48DrSXpl\n9ZA8+taakHMiljdjxsC4cbB+fd/7uDkrvyoJIodFxLeB1yLivoj4OPDOjMtlGdi8Ofm1d+ihjS6J\n2b76qyFEJDWVoYwRKZo+PRnY+NprQz+XJSoJIsWPe72k90s6EfDXUBMqzj3kMSKWN/3VEDZsSEa1\nH3TQ0K8zYgRMneqxIrVUybQn/1fSeODzwD8ChwB/nWmpLBPOh1he9dfNt9ZNsMVazxveULtztrN+\ng0g6e++REfETYDPQUY9CWTacD7G8KhTg4T5Gn9X6x4/zIrXVb3NWROwCzqlTWSxjrolYXvWXE3EQ\nybdKciK/lHStpLdLOlHSSWlexJqMnyNiedXfF3ut71s/V6S2KgkiJwDHAF8Crga+kv47IEnzJD0u\n6QlJ+w1YlHSopDslLZP0QPFZ7pKOkrS05G+LpE+n2yZIukfS7yXdneZrrAJuzrK8mjUrSXbv2rX/\ntqxyIlYblYxY7xjMidN8yrXAqcA64EFJCyNiRclulwJLIuIMSUcB3wBOjYiVJMELScPS4+9Mj7kE\nuCcirkoD0yXpn/WjOJW2ayKWRwceCBMmJFO1T5++7zY3Z+XbgEFE0mVAkMzmu+f5YxHxpQEOnQs8\nGRE96XluA04HSoPIHOCK9HwrJRUkTYqIjSX7nAo8FRHFTnkfAN6RLt8EdOEgMqBNm2DYMBjvepvl\nVPHLvTSI7N4Nq1fXZoxI0eGHJ1PLv/oqjBpVu/O2q0qas15K/7YBu0kek1uo4LhpQGlv7LXpulLL\ngDMBJM0FZgFlv0M4G7il5PXkiNiQLm8AJldQlrbnfIjlXW81hPXrk2eIjBlTu+uMGAHTpiXByYau\nkuasr5S+lvQPwN0VnLuCpyZzBbBA0lJgObAU2NMqKmkkcBp9TAAZESGpz+t0dnbuWe7o6KCjo6OC\nIrUm50Ms73pLeGd13xbzIkccUftzN5Ouri66urqGdI5KBhuWG8v+NYrerANmlLyeQVIb2SMitgKf\nKL6W1A38oWSX9wIPlzVvbZA0JSLWS5oKPNtXAUqDSLtzPsTybvZs+PWv912X1X3rvEii/Mf15Zdf\nXvU5KpnFd3nJ36PASmBBBed+CDgizXOMBM4CFpade1y6DUkXAPelU88XnQPcWnbehcB56fJ5wI8q\nKEvbcxCxvOvti91BJP8qqYmcVrK8E9gQETsGOigidkq6ELiL5HG6N0TECknz0+3XA0cDN6ZNUr8D\nzi8eL2ksSVL9grJTXwF8X9L5JDMKf7SC99D2urvh1FMbXQqzvvX2xd7dDSedlM21Fi2q/XnbUSVB\nZArwWES8CCDpEEknRsQDAx0YEYuARWXrri9Zvh84qo9jXwIm9rL+BZLgYlVwTsTybuZMWLs2GSsy\nfHiyrqcHPvzh2l/LY0Vqp5LeWd8k6ZlV9FK6zppEcYxILbtJmtXaqFEwaRKsW7d3nZuz8q+SIEJE\n7C5Z3kXSPGVN4vnnYeTI5ME/ZnlW+uW+e3cyin3mzNpfZ+rU5P+LV16p/bnbTSVBpFvSpyUdIGmk\npM+wbw8qyzmPEbFmUdrN9+mnkweojR5d++sMHw4zZiQPu7KhqSSIfAp4G0mX3bXAycBfZlkoqy3n\nQ6xZlNZEsr5vnRepjUoGG24g6Z5rTcrde61ZzJ4Nv/xlspz1feu8SG1UMk7ke6Uz5aYz734n22JZ\nLTmIWLMor4k4iORfJc1ZfxIRm4svImIT4OeJNBHnRKxZlOZEsr5v/VyR2qgkiEjShJIXE3DvrKbi\nnIg1ixkzkoT6zp3OiTSLSgYbXg3cL+n7JNPBfwT420xLZTXjMSLWTEaOhClTkkGHbs5qDpUk1r8n\n6WHgnSQz854REY9lXjKriY0bk2m0Dz640SUxq0yhAE89lQSSLMaIFE2ZAps3w/bt2XQjbheVDjZ8\nNCL+EfgZ8KF0IkZrAs6HWLMpFOBXv4KJE7N9aNSwYUmQcm1kaCrpnTVN0uckPUgySeJwkgdFWRNw\nPsSazezZsHhxfe5b50WGrs8gImm+pC7gHmA8yXM/nomIzohYXqfy2RC5e681m0IB7r+/Pvet8yJD\n119O5FqS5qvPRMQyAEl1KZTVTk8P/PEfN7oUZpUrFJLnnzuINIf+mrOmAj8Fvi5phaQvAwfUp1hW\nK86JWLMp3q/1CiIeKzI0fQaRiHguIq6LiHcA7wG2kDya9nFJf1e3EtqQOCdizWbGjGSCROdEmoMi\noroDpCOBsyPiS9kUqTYkRbXvrV4++EH4yU/qc60DD4QNG2Ds2Ppcz6wW3vIWuOMOmD492+s891xy\njZ07s73OUIwenQS6ww7L/lqSiIiq8hZVB5FmkecgMmsW3HtvfX5pSXufEmdm+9u1KxmUm1cnnwzf\n+EYSWLM2mCBSyYh1q6EdO2D9+qQtdoQ/fbOGy/uPrNmzk7xNPYLIYFQ02NBqZ82a5KlqB7iLgplV\nIO95mwF/C0s6iWS6k1JbgFURkeOWxHzyuA0zq0ahAI880uhS9K2SBpVvACcBxbdxLPAoME7S/4qI\nu7IqXCtyEDGzahQKsHBho0vRt0qas54Gjo+IkyLiJOB4kmesvxu4KsvCtSKP2zCzauR9LEslQeSo\niNgz4WI6g+8bI+Ip9m/m2oekeem4kickXdzL9kMl3SlpmaQHJB1Tsm28pB+kAx0fk/SWdH2npLWS\nlqZ/8yp+tzngcRtmVo1CAVavht27G12S3lXSnPWopOuA20ieJ/JR4DFJo4AdfR0kaTjJ1CmnAuuA\nByUtjIgVJbtdCiyJiDMkHUXSdHZqum0B8NOI+LCkEUBxpEMA10TENRW/yxxxc5aZVWPMGDjkkGS8\n19SpjS7N/iqpifwF8BTwWeAzJE1Z55EEkHf2c9xc4MmI6ImIHSRB6PSyfeYAiwEiYiVQkDRJ0jjg\n7RHxnXTbzojYUnJc007i5SBiZtXKc5PWgEEkIl6OiK9ExBnp31fSdbsjYms/h04D1pS8XpuuK7UM\nOBNA0lxgFjAdmA1slPRdSUsk/bOkMSXHXZQ2gd0gaXwF7zMXXn0Vnn0WppV/CmZm/cjzRJGVdPE9\nBbgMKJTsHxHx+gEOrWQM6BXAAklLgeXAUmAXMBI4EbgwIh6U9DXgEuCLwHVAccqVL5M8vvf83k7e\n2dm5Z7mjo4OOjo4KipSdNWuSAOJBhmZWjazGinR1ddHV1TWkcww47YmklSRNWUtIvuCBZILGAY47\nGeiMiHnp6y8AuyPiyn6O6SbpQnwQcH9EzE7XnwJcEhHvL9u/APw4Io7t5Vy5m/bk3nvh7/4OfvGL\nRpfEzJrJN78JS5bAt76V7XUGM+1JJTmRzRGxKCI2pDP7PjdQAEk9BBwhqSBpJHAWsE9vZ0nj0m1I\nugC4LyK2RcR6YE062SMkyfZH0/1KU0tnkNRgmoLzIWY2GHnOiVTSsLJY0j8APwReLa6MiCX9HRQR\nOyVdCNxF8kjdGyJihaT56fbrgaOBGyUFyaN3S5ulLgJuToPMU8DH0/VXSjqepLmsG5hfwXvIBY8R\nMbPByHNOpJLmrC56yW9ExJ9mVKaayGNz1sc+BvPmwZ//eaNLYmbNZPt2OPRQePllGJbhjIeZzOIb\nER2DLpHtw81ZZjYYo0cnQeSZZ/LXu7PPICLpzyPiXyR9nn1rIiLpndWUg/0ayUHEzAarmBfJWxDp\nr2JUHJdxcNnfQem/VoVXXkmeonb44Y0uiZk1o7zmRfqsiaSJb4B7I+KXpdvSLrdWhdWr9z472sys\nWnl9rkglKZp/7GXd12tdkFbnpiwzG4qmq4lI+m/AW4FJkj7H3vmqDibpsmtVcBAxs6EoFOD22xtd\niv311ztrJHsDRmkO5EXgw1kWqhV5jIiZDUVem7P6y4ncB9wn6bsRsQr2TO9+UNmMulaBnh54//sH\n3M3MrFczZ8LatbBrV75yq5XkRP5e0iGSxpJMMfKYpL/JuFwtx81ZZjYUo0bBxInw9NONLsm+Kgki\nx0TEi8AHgUUks/l6zHWVHETMbKjyOIdWJUFkhKQDSILIj9MHTOVrPpGc274dNm3K51PJzKx55DEv\nUkkQuR7oIRlk+B/p9OvOiVRh1aqkPTPLOW/MrPXlsZtvJU82/HpETIuI90bEbmAVkOvJF/PGTVlm\nVgtN2ZwlaUr6GNqfpavmkDxj3Srk7r1mVgtNWRMBbgTuBoqzPj0B/HVWBWpFromYWS00VU5EUnEM\nycSIuJ300bhpYn1nHcrWMnp6kv/4ZmZDMWNG0sV3Z46+gfurifwm/XebpInFlemz051Yr4JrImZW\nCyNHwutelww6zIv+pj0pzpX1eeDfgddL+i9gEp72pCrOiZhZrRTzInn5TukviJROvHgn8NN0+VXg\nXcCy7IvX/F56CbZuhcmTG10SM2sFecuL9BdEyideLBrTyzrrw6pVMGuWx4iYWW3krYdWf0FkfURc\nXreStKg8VTvNrPkVCnDffY0uxV7+fZwx50PMrJbyVhPpL4icWrdStDDXRMyslvKWE+kziETE80M9\nuaR5kh6X9ISki3vZfqikOyUtk/SApGNKto2X9ANJKyQ9lnYtRtIESfdI+r2kuyWNH2o5s+QxImZW\nS9Onw/r1sGNHo0uSyKw5K32A1bXAPOBo4BxJc8p2uxRYEhHHAecCC0q2LQB+GhFzgD8BVqTrLwHu\niYgjgZ+nr3PLNREzq6UDDoApU2DNmkaXJJFlTmQu8GRE9KSj3G8DTi/bZw6wGCAiVgIFSZMkjQPe\nHhHfSbftLHma4geAm9Llm0imqM8t50TMrNby1KSVZRCZBpTGyrXpulLLgDMBJM0FZgHTgdnARknf\nlbRE0j9LKnYtnhwRG9LlDUBuR2Bs3Qovv5yMMDUzq5U8Jdf76+I7VJU8uOoKYIGkpSSP3l1KMkfX\nSOBE4MKIeFDS10iarb64zwUiQlKf1+ns7Nyz3NHRQUdHR5VvYWhWrUr+Y0sD7mpmVrFaBZGuri66\nurqGdI4sg8g6YEbJ6xkktZE9ImIr8Inia0ndwB9IHoC1NiIeTDfdARQT8xskTYmI9ZKmAs/2VYDS\nINIIzoeYWRYKBfj5z4d+nvIf15dfXv3QwCybsx4CjpBUkDQSOAtYWLqDpHHpNiRdANwXEdsiYj2w\nRtKR6a7vAh5Nlxey93km5wE/yvA9DInzIWaWhTzlRDKriUTETkkXAneRTKFyQ0SskDQ/3X49Sa+t\nG9Mmqd8B55ec4iLg5jTIPAV8PF1/BfB9SeeTPLb3o1m9h6FyTcTMspCnnIgiKkldNB9J0ej39qEP\nwdlnw0c+0tBimFmL2bkTxo6FF1+EUaNqd15JRERVWVxPe5Ih10TMLAsjRsDhh+djrIiDSIacEzGz\nrOQlL+IgkpEtW+DVV2HixIH3NTOrVl7yIg4iGVm1Kvml4DEiZpaFQiFp7Wg0B5GMOB9iZllyTaTF\nOR9iZllyTqTFuSZiZllyTaTF+TkiZpalww+H556DV15pbDkcRDLimoiZZWn48OQBVatXN7YcDiIZ\ncU7EzLKWh7yIg0gGNm+GXbtgwoRGl8TMWlke8iIOIhko5kM8RsTMspSHsSIOIhlwPsTM6sHNWS3K\n+RAzqwc3Z7Uo10TMrB4cRFqUx4iYWT1MnQqbNsH27Y0rg4NIBlwTMbN6GDYMZs5MJnxtWBkad+nW\nFOGciJnVT6ObtBxEamzTpqRr7/jxjS6JmbWDRnfzdRCpMY8RMbN6ck2kxTgfYmb11OixIg4iNeZ8\niJnVk2siLcbde82snlo6JyJpnqTHJT0h6eJeth8q6U5JyyQ9IOmYkm09kh6RtFTSb0rWd0pam65f\nKmlelu+hWm7OMrN6mjwZtm6Fl15qzPUzCyKShgPXAvOAo4FzJM0p2+1SYElEHAecCywo2RZAR0Sc\nEBFzy9Zfk64/ISJ+ltV7GAwHETOrp2HDYNasxo0VybImMhd4MiJ6ImIHcBtwetk+c4DFABGxEihI\nmlSyva8+Trns++QxImbWCI3Mi2QZRKYBa0per03XlVoGnAkgaS4wC5iebgvgXkkPSbqg7LiL0iaw\nGyTlZkTJd1hWAAAKqklEQVTG88/DyJEwblyjS2Jm7aSReZERGZ47KtjnCmCBpKXAcmApsCvddkpE\nPJ3WTO6R9HhE/CdwHfCldJ8vA1cD5/d28s7Ozj3LHR0ddHR0DOJtVM5NWWbWCIOtiXR1ddHV1TWk\nayuiku/6QZxYOhnojIh56esvALsj4sp+jukGjo2IbWXrLwO2RcTVZesLwI8j4thezhVZvbe+/OAH\ncMst8MMf1vWyZtbmbr89+f75t38b2nkkERFVpQuybM56CDhCUkHSSOAsYGHpDpLGpdtIm6zui4ht\nksZIOjhdPxZ4D0lNBUlTS05xRnF9HjgfYmaN0MicSGbNWRGxU9KFwF3AcOCGiFghaX66/XqSXls3\nSgrgd+xtlpoM3Klk7pARwM0RcXe67UpJx5M0l3UD87N6D9Xq6YE3vrHRpTCzdtPInEhmzVmN1ojm\nrPe9Dz71KTjttLpe1szaXASMHQsbNsDBBw/+PHlrzmo7TqybWSNIyXdPI8aKOIjUSISDiJk1TqPy\nIg4iNbJxI4wePbSqpJnZYDUqL+IgUiOuhZhZIzVqSngHkRpxEDGzRnJzVpPzGBEzayQ3ZzU5P0fE\nzBrJNZEm5+YsM2ukiRPhtddgy5b6XtdBpEYcRMyskRo1VsRBpAaKY0RmzWp0ScysnTUiL+IgUgPF\nqQYOOqjRJTGzdtaIvIiDSA24KcvM8qARY0UcRGrAQcTM8sA1kSblMSJmlgfOiTQpjxExszxwTaRJ\nuTnLzPJgwgTYvRs2b67fNR1EasDNWWaWB8WxIvWsjTiIDNHu3bB6tceImFk+1Dsv4iAyROvXw/jx\nMGZMo0tiZlb/br4OIkPkfIiZ5Ymbs5qM8yFmlicOIk3GNREzy5OWyolImifpcUlPSLq4l+2HSrpT\n0jJJD0g6pmRbj6RHJC2V9JuS9RMk3SPp95LuljQ+y/cwEI8RMbM8KeZEIupzvcyCiKThwLXAPOBo\n4BxJc8p2uxRYEhHHAecCC0q2BdARESdExNyS9ZcA90TEkcDP09cN0ww1ka6urkYXIRf8Oezlz2Kv\nVvssxo+HYcNg06b6XC/Lmshc4MmI6ImIHcBtwOll+8wBFgNExEqgIGlSyXb1ct4PADelyzcBH6xp\nqavUDDmRVvufZLD8Oezlz2KvVvws6tmklWUQmQasKXm9Nl1XahlwJoCkucAsYHq6LYB7JT0k6YKS\nYyZHxIZ0eQMwudYFr9SuXbBmjceImFm+1DO5PiLDc1fSIncFsEDSUmA5sBTYlW47JSKeTmsm90h6\nPCL+c58LRISkmrT83Xgj3HFHdcfs2AGHHQYHHliLEpiZ1cbrXw+dncn3WtYUGWVfJJ0MdEbEvPT1\nF4DdEXFlP8d0A8dGxLay9ZcBWyPiGkmPk+RK1kuaCiyOiDf2cq46pZXMzFpHRPSWRuhTljWRh4Aj\nJBWAp4GzgHNKd5A0DtgeEa+lTVb3RcQ2SWOA4RGxVdJY4D3A5elhC4HzgCvTf3/U28Wr/SDMzKx6\nmQWRiNgp6ULgLmA4cENErJA0P91+PUmvrRvTWsPvgPPTwycDd0oqlvHmiLg73XYF8H1J5wM9wEez\neg9mZta/zJqzzMys9bXciPWBBji2k74GbLYDSd+RtEHS8pJ1uRqoWi99fBadktam98ZSSfMaWcZ6\nkTRD0mJJj0r6naRPp+vb7t7o57Oo6t5oqZpIOsBxJXAqsA54EDgnIlY0tGANknZUOCkiXmh0WepN\n0tuBbcD3IuLYdN1VwHMRcVX6A+PQiGjoYNV66OOz2NNZpaGFqzNJU4ApEfFbSQcBD5OMNfs4bXZv\n9PNZfJQq7o1Wq4lUMsCx3bRlB4O0O3j5mN1cDVStlz4+C2jDeyMi1kfEb9PlbcAKkvFrbXdv9PNZ\nQBX3RqsFkUoGOLaTvgZstqvcDFTNiYvSeetuaIfmm3Jpz9ETgAdo83uj5LP4dbqq4nuj1YJI67TN\n1cbbIuIE4L3AX6XNGkYyUJX2vl+uA2YDxwPPAFc3tjj1lTbf3AF8JiK2lm5rt3sj/Sx+QPJZbKPK\ne6PVgsg6YEbJ6xkktZG2FBHPpP9uBO4kae5rZxvSdmDSgarPNrg8DRMRz0YK+DZtdG9IOoAkgPxL\nRBTHmbXlvVHyWfxr8bOo9t5otSCyZ4CjpJEkAxwXNrhMDSFpjKSD0+XigM3l/R/V8ooDVaGfgart\nIP2iLDqDNrk3lAw+uwF4LCK+VrKp7e6Nvj6Lau+NluqdBSDpvcDX2DvA8e8bXKSGkDSbpPYBewds\nts1nIelW4B3ARJI27i8C/w58H5hJOlA1IjY3qoz10stncRnQQdJcEUA3ML8kJ9CyJJ0C/AfwCHub\nrL4A/IY2uzf6+CwuJZlZpOJ7o+WCiJmZ1U+rNWeZmVkdOYiYmdmgOYiYmdmgOYiYmdmgOYiYmdmg\nOYiYmdmgOYhYU5P0C0nvKVv3WUn/1M8xXZJOyrhct6ZzD32mbH2npM+nywem049/sZfjPyLpMUk/\nH0IZtpUs/5mklZJmpmV4SdKkPvbdLekrJa//dzrrr9l+HESs2d0KnF227izgln6OyXRupHT6jDdF\nxHERsaC3a6czKtwBPBgRX+rlNOcDn4yId1V4zd6eUhrptncBC4B5EbE63fYc8PnyfVOvAWdIOqyX\nbWb7cBCxZncH8L7il2g6G+nhEfFLSddJejB94E5nbweX/QL/sKTvpsuTJP1A0m/Sv7f2cuyBkr6r\n5MFfSyR1pJvuBqalD/Q5pZfLHkDymIKVEXFpL+f9IvA24DuSrpQ0qrfrSPoLSQvT2so9fby//w58\nC3hfRHSnqwP4DnBWHzO07kiP+evezmlWykHEmlr6wK3fAH+WrjobuD1dvjQi3gwcB7xD0rG9naKP\n5QXAVyNiLvBhkonoyv0VsCsi/oRkqoib0hrGacBTEXFCRPyy7BgBfwO8GhGf6+M9fYlkHrj/GREX\nAxf2cp1R6e4nAB+KiD/t5VQHkkx9c3pE/L5s2zaSQPLZ3soA/BPwMUmH9LHdDHAQsdZQ2qR1Vvoa\nkl/aDwNLgGOAOVWc81TgWklLSebcOljSmLJ93gb8K0BErARWAUfS/wN9Avgl8FZJR1RYlr6uE8A9\n/czx9BrwK+CTfZTj68B56VTg+25Mpkf/HvDpCstobcpBxFrBQuBdkk4AxkTE0nQCys8D74yI44D/\nR/LLvFxp7WN0ybKAt6S1iRMiYkZEvNzL8YN5OuB/kDQVLSpOP16Bvq7zUj/H7CZ51OlcSV8oP19E\nbCHJHV3Yx/FfI8nNjK2wjNaGHESs6aUP0lkMfJe9CfVDSL5gX5Q0meTBXL3ZIOmNkoaRTHtdDCp3\nU/IrXNLxvRz7n8DH0u1HkswAu7LCMv8Q+ArwM0njBti9t+s8TgUBLCJeAd5H0jT1iV52uQaYTzLT\nc/mxm0hmtj0fJ9etDw4i1ipuBY5N/yUilgFLSb5sbyZpQurNJcBPSJp9ni5Z/2ngTWk33UeBv+zl\n2H8Chkl6hCRRfl5E7Ei39felG2kZv0mSs1hYkuPoTV/XGaiXWfE6m4B5wP+RdFrZtueBHwIjy49L\nXU0yhbxZrzwVvJmZDZprImZmNmgOImZmNmgOImZmNmgOImZmNmgOImZmNmgOImZmNmgOImZmNmgO\nImZmNmj/H1ZYEx17MD37AAAAAElFTkSuQmCC\n",
       "text": [
        "<matplotlib.figure.Figure at 0x17208e48>"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- **Training accuracy** rises as model complexity increases\n",
      "- **Testing accuracy** penalizes models that are too complex or not complex enough\n",
      "- For KNN models, complexity is determined by the **value of K** (lower value = more complex)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Making predictions on out-of-sample data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# instantiate the model with the best known parameters\n",
      "knn = KNeighborsClassifier(n_neighbors=11)\n",
      "\n",
      "# train the model with X and y (not X_train and y_train)\n",
      "knn.fit(X, y)\n",
      "\n",
      "# make a prediction for an out-of-sample observation\n",
      "knn.predict([3, 5, 4, 2])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "array([1])"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Downsides of train/test split?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Provides a **high-variance estimate** of out-of-sample accuracy\n",
      "- **K-fold cross-validation** overcomes this limitation\n",
      "- But, train/test split is still useful because of its **flexibility and speed**"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Resources\n",
      "\n",
      "- Quora: [What is an intuitive explanation of overfitting?](http://www.quora.com/What-is-an-intuitive-explanation-of-overfitting/answer/Jessica-Su)\n",
      "- Video: [Estimating prediction error](https://www.youtube.com/watch?v=_2ij6eaaSl0&t=2m34s) (12 minutes, starting at 2:34) by Hastie and Tibshirani\n",
      "- [Understanding the Bias-Variance Tradeoff](http://scott.fortmann-roe.com/docs/BiasVariance.html)\n",
      "    - [Guiding questions](https://github.com/justmarkham/DAT5/blob/master/homework/06_bias_variance.md) when reading this article\n",
      "- Video: [Visualizing bias and variance](http://work.caltech.edu/library/081.html) (15 minutes) by Abu-Mostafa"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Comments or Questions?\n",
      "\n",
      "- Email: <kevin@dataschool.io>\n",
      "- Website: http://dataschool.io\n",
      "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import HTML\n",
      "def css_styling():\n",
      "    styles = open(\"styles/custom.css\", \"r\").read()\n",
      "    return HTML(styles)\n",
      "css_styling()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<style>\n",
        "    @font-face {\n",
        "        font-family: \"Computer Modern\";\n",
        "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
        "    }\n",
        "    div.cell{\n",
        "        width: 90%;\n",
        "/*        margin-left:auto;*/\n",
        "/*        margin-right:auto;*/\n",
        "    }\n",
        "    ul {\n",
        "        line-height: 145%;\n",
        "        font-size: 90%;\n",
        "    }\n",
        "    li {\n",
        "        margin-bottom: 1em;\n",
        "    }\n",
        "    h1 {\n",
        "        font-family: Helvetica, serif;\n",
        "    }\n",
        "    h4{\n",
        "        margin-top: 12px;\n",
        "        margin-bottom: 3px;\n",
        "       }\n",
        "    div.text_cell_render{\n",
        "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
        "        line-height: 145%;\n",
        "        font-size: 130%;\n",
        "        width: 90%;\n",
        "        margin-left:auto;\n",
        "        margin-right:auto;\n",
        "    }\n",
        "    .CodeMirror{\n",
        "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
        "    }\n",
        "/*    .prompt{\n",
        "        display: None;\n",
        "    }*/\n",
        "    .text_cell_render h5 {\n",
        "        font-weight: 300;\n",
        "        font-size: 16pt;\n",
        "        color: #4057A1;\n",
        "        font-style: italic;\n",
        "        margin-bottom: 0.5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "    }\n",
        "\n",
        "    .warning{\n",
        "        color: rgb( 240, 20, 20 )\n",
        "        }\n",
        "</style>\n",
        "<script>\n",
        "    MathJax.Hub.Config({\n",
        "                        TeX: {\n",
        "                           extensions: [\"AMSmath.js\"]\n",
        "                           },\n",
        "                tex2jax: {\n",
        "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
        "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
        "                },\n",
        "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
        "                \"HTML-CSS\": {\n",
        "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
        "                }\n",
        "        });\n",
        "</script>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "<IPython.core.display.HTML at 0x3b1d1d0>"
       ]
      }
     ],
     "prompt_number": 1
    }
   ],
   "metadata": {}
  }
 ]
}